The derivative is a simple tool for understanding a mathematical function locally - meaning at and around a single point. More specifically the derivative at a point defines the best linear approximation - a line in two dimensions, a hyperplane in higher dimensions - that matches the given function at that point as well as a line / hyperplane can.
Why would someone need / come up with such an idea? Because most of the mathematical functions we deal with in machine learning, mathematical optimization, and science in general are too high dimensional for us to examine by eye. Because they live in higher dimensions we need tools (e.g., calculus) to help us understand and intuit their behavior.
# This code cell will not be shown in the HTML version of this notebook
#imports: autograd and custom library
import sys
sys.path.append('../../')
from mlrefined_libraries import calculus_library as calclib
import autograd.numpy as np
%matplotlib notebook
from matplotlib import rcParams
rcParams['figure.autolayout'] = True
Let us begin exploring this idea in pictures before jumping into the math. Lets examine a few candidate functions - beginning with the standard sinusoid
\begin{equation} g(w) = \text{sin}(w) \end{equation}
Below we draw this function over a small range of its inputs, and then at each point draw the line defined by the function's derivative there on top.
The final result is an animated slider widget - at each increment of the slider the sinusoidal function is drawn in black, the point we are at in red, and the corresponding line produced using the derivative in green. Sliding from left to right moves the point - and its associated derivative given line - smoothly across the function.
# what function should we play with? Defined in the next line.
g = lambda w: np.sin(w)
# create an instance of the visualizer with this function
taylor_viz = calclib.taylor2d_viz.visualizer(g = g)
# run the visualizer for our chosen input function
taylor_viz.draw_it(first_order = True,num_frames = 2)
Notice a few things - first as you adjust the slider notice how the line produced by the derivative of the point is always tangent to the function. This is true more generally as well - for any function the linear approximation given by the derivative is tangent to the function at every point. Second - notice how the slope of the line defined by the derivative hugs the function at every point - it seems to match the general local steepness of the curve everywhere. This is also true in general: the slope of the tangent line given by the derivative always gives local steepness - or slope - of the function itself. The derivative naturally encodes this information. Third: notice how at each increment of the slider the tangent line defined by the derivative matches the function itself near the point in red. This is also true in general - the derivative at a point always defines a line that matches the underlying function near that point. In short - the derivative at a point is the slope of the tangent line at that point.
Lets examine another candidate function using the same widget toy
\begin{equation} g(w) = \text{sin}(4w) + 0.1w^2 \end{equation}
# what function should we play with? Defined in the next line.
g = lambda w: np.sin(4*w) + .5*w**2
# create an instance of the visualizer with this function
taylor_viz = calclib.taylor2d_viz.visualizer(g = g)
# run the visualizer for our chosen input function
taylor_viz.draw_it(first_order = True,num_frames = 20)
Again as you slide from left to right you can see how the line defined by the derivative at each point stays tangent to the curve, hugs the function's shape everywhere, and generally matches the function near the point.
In the image below we show a picture of the sinusoid in the left panel, where we have plugged the input point $w^0 = 0$ into the sinusoid and highlighted the corresponding point $(0, \text{sin}(0))$ in green . In the middle panel we plot another point on the curve - with input $w^1 = -2.6$ the point $(-2.6, \text{sin}(-2.6) ) $ in blue , and the *secant line* in red formed by connecting $(-2.6, \text{sin}(-2.6) ) $ and $(0, \text{sin}(0))$ . Finally in the right panel we show the tangent line at $w = 0$ in lime green. The gray vertical dashed lines in the middle panel are there for visualization purposes only.

A secant line is just a line formed by taking any two points on a function - like our sinusoid - and connecting them with a straight line. On the other hand, while a tangent line can cross through several points of a function it is explicitly defined using only a single point. So in short - a secant line is defined by two points, a tangent line by just one.
The equation of any secant line is easy to derive - since all we need is the slope and any point on the line to define it - and the slope of a line can be found using any two points on it (like the two points we used to define the secant to begin with).
The slope - the line's 'steepness' or 'rise over run' - is the ratio of change in output $g(w)$ over the change in input $w$. If we used two generic inputs $w^0$ and $w^1$ - above we chose $w^0 = 0$ and $w^1 = -2.6$ - we can write out the slope of a secant line generally as
\begin{equation} \text{slope of a secant line} = \frac{g(w^1) - g(w^0)}{w^1 - w^0} \end{equation}
Now using the point-slope form of a line we can directly write out the equation of a secant using the slope above and either of the two points we used to define the secant to begin with - using $(w^0, g(w^0))$ we then have the equation of a secant line $h(w)$ is
\begin{equation} h(w) = g(w^0) + \frac{g(w^1) - g(w^0)}{w^1 - w^0}(w - w^0) \end{equation}
If we think about our green point at $w^0 = 0$ as fixed, then the tangent line at this point can be thought of as the line we get when we shift the blue point very close - infinitely close actually - to the green one.
Taking $w^0 = 0$ and $w^1 = -2.6$ the equation of the secant line connecting $(w^0,\text{sin}(w^0))$ and $(w^1,\text{sin}(w^1))$ on the sinusoid is given as
\begin{equation} h(w) = \text{sin}(0) + \frac{\text{sin}(-2.6) - \text{sin}(0)}{-2.6 - 0}(w - 0) \end{equation}
Since $\text{sin}(0) = 0$ and $\text{sin}(-2.6) \approx -0.5155$ we can write this as
\begin{equation} h(w) = \frac{0.5155}{2.6}w \end{equation}
Below we show a slider-based animation widget that illustrates precisely this idea. As you shift the slider from left to right the blue point - along with the red secant line that passes through it and the green point - moves closer and closer to our fixed point. Finally - when the two points lie right on top of each other - the secant line becomes the green tangent line at our fixed point.
# what function should we play with? Defined in the next line, along with our fixed point where we show tangency.
g = lambda w: np.sin(w)
# create an instance of the visualizer with this function
st = calclib.secant_to_tangent.visualizer(g = g)
# run the visualizer for our chosen input function and initial point
st.draw_it(w_init = 0, num_frames = 200)
In sliding back and forth, notice how it does not matter if we start from the left of our fixed point and move right towards it, or start to the right of the fixed point and move left towards it: either way the secant line gradually becomes tangent to the curve at $w^0 = 0$. There is no big 'jump' in the slope of the line if we wiggle the slider ever so slightly to the left or right of the fixed point - the slopes of the nearby secant lines are very very similar to that of the tangent.
When we can do this - come at a fixed point from either the left or the right and the secant line becomes tangent smoothly from either direction with no jump in the value of the slope - we say that a function has a derivative at this point, or likewise say that it is differentiable at the point.
Many functions like our sinusoid, other trigonometric functions, and polynomials are differentiable at every point - or just differentiable for short. In the Jupyter notebook version of this Section you can tinker around with the previous Python cell - pick another fixed point! - and see this for yourself. You can also tinker around with the function - for example in the next cell we show - using the same slider mechanism - that the function
\begin{equation} g(w) = \text{tanh}(w)^2 \end{equation}
has a derivative at the point $w^0$ = 1.
# what function should we play with? Defined in the next line, along with our fixed point where we show tangency.
g = lambda w: np.tanh(w)**2
# create an instance of the visualizer with this function
st = calclib.secant_to_tangent.visualizer(g = g)
# run the visualizer for our chosen input function and initial point
st.draw_it(w_init = 1, num_frames = 300)